A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis
نویسندگان
چکیده
Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA-based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type-specific developmental gene expression patterns.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملشناسایی RNA های غیرکدکننده کوتاه عملکردی با استفاده از روش های بیوانفورماتیکی در گوسفند و بز
MicroRNAs (miRNAs) are small non-coding RNAs that have functional roles in post-transcriptional modification. They regulate gene expression by an RNA interfering pathway through cleavage or inhibition of the translation of target mRNA. Numerous miRNAs have been described for their important functions in developmental processes in numerous animals, but there is limited information about sheep an...
متن کاملNovel Isatin-based activator of p53 transcriptional functions in tumor cells
Bioinorganic medicinal chemistry remains a hot field for research aimed at developing novel anti-cancer treatments. Discovery of metal complexes as potent antitumor chemotherapeutics such as cisplatin led to a significant shift of focus toward organometallic/ bioinorganic compounds containing transition metals and their chelates as novel scaffolds for drug discovery. In that way, transition met...
متن کاملThe myocardin family of transcriptional coactivators: versatile regulators of cell growth, migration, and myogenesis.
The association of transcriptional coactivators with sequence-specific DNA-binding proteins provides versatility and specificity to gene regulation and expands the regulatory potential of individual cis-regulatory DNA sequences. Members of the myocardin family of coactivators activate genes involved in cell proliferation, migration, and myogenesis by associating with serum response factor (SRF)...
متن کاملUnderstanding Co-expressed Gene Sets by Identifying Regulators and Modeling Genomic Elements
Genomic researchers commonly study complex phenotypes by identifying experimentally derived sets of functionally related genes with similar transcriptional profiles. These gene sets are then frequently subjected to statistical tests of association relating them to previously characterized gene sets from literature and public databases. However, few tools exist examining the non-coding, regulato...
متن کامل